Selective Provenance for Datalog Programs Using Top-K Queries

نویسندگان

Daniel Deutch

Amir Gilad

Yuval Moskovitch

چکیده

Highly expressive declarative languages, such as datalog, are now commonly used to model the operational logic of dataintensive applications. The typical complexity of such datalog programs, and the large volume of data that they process, call for result explanation. Results may be explained through the tracking and presentation of data provenance, and here we focus on a detailed form of provenance (howprovenance), defining it as the set of derivation trees of a given fact. While informative, the size of such full provenance information is typically too large and complex (even when compactly represented) to allow displaying it to the user. To this end, we propose a novel top-k query language for querying datalog provenance, supporting selection criteria based on tree patterns and ranking based on the rules and database facts used in derivation. We propose an efficient novel algorithm based on (1) instrumenting the datalog program so that, upon evaluation, it generates only relevant provenance, and (2) efficient top-k (relevant) provenance generation, combined with bottom-up datalog evaluation. The algorithm computes in polynomial data complexity a compact representation of the top-k trees which may be explicitly constructed in linear time with respect to their size. We further experimentally study the algorithm performance, showing its scalability even for complex datalog programs where full provenance tracking is infeasible.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Circuits for Datalog Provenance

The annotation of the results of database queries with provenance information has many applications. This paper studies provenance for datalog queries. We start by considering provenance representation by (positive) Boolean expressions, as pioneered in the theories of incomplete and probabilistic databases. We show that even for linear datalog programs the representation of provenance using Boo...

متن کامل

Efficiently Computing Provenance Graphs for Queries with Negation

Explaining why an answer is in the result of a query or why it is missing from the result is important for many applications including auditing, debugging data and queries, and answering hypothetical questions about data. Both types of questions, i.e., why and why-not provenance, have been studied extensively. In this work, we present the first practical approach for answering such questions fo...

متن کامل

A PROV Encoding for Provenance Analysis Using Deductive Rules

PROV is a specification, promoted by the World Wide Web consortium, for recording the provenance of web resources. It includes a schema, consistency constraints and inference rules on the schema, and a language for recording provenance facts. In this paper we describe a implementation of PROV that is based on the DLV Datalog engine. We argue that the deductive databases paradigm, which underpin...

متن کامل

Combined Tractability of Query Evaluation via Tree Automata and Cycluits (Extended Version)

We investigate parameterizations of both database instances and queries that make query evaluation fixed-parameter tractable in combined complexity. We introduce a new Datalog fragment with stratified negation, intensional-clique-guarded Datalog (ICG-Datalog), with linear-time evaluation on structures of bounded treewidth for programs of bounded rule size. Such programs capture in particular co...

متن کامل

Implementing Unified Why- and Why-Not Provenance Through Games

Using provenance to explain why a query returns a result or why a result is missing has been studied extensively. However, the two types of questions have been approached independently of each other. We present an efficient technique for answering both types of questions for Datalog queries based on a game-theoretic model of provenance called provenance games. Our approach compiles provenance r...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

PVLDB

دوره 8 شماره

صفحات -

تاریخ انتشار 2015

Selective Provenance for Datalog Programs Using Top-K Queries

نویسندگان

چکیده

منابع مشابه

Circuits for Datalog Provenance

Efficiently Computing Provenance Graphs for Queries with Negation

A PROV Encoding for Provenance Analysis Using Deductive Rules

Combined Tractability of Query Evaluation via Tree Automata and Cycluits (Extended Version)

Implementing Unified Why- and Why-Not Provenance Through Games

عنوان ژورنال:

اشتراک گذاری